Regularized Online Learning
نویسندگان
چکیده
Recently, Word2Vec tool has attracted a lot of interest for its promising performances in a variety of natural language processing (NLP) tasks. However, a critical issue is that the dense word representations learned in Word2Vec are lacking of interpretability. It is natural to ask if one could improve their interpretability while keeping their performances. Inspired by the success of sparse models in enhancing interpretability, we propose to introduce sparse constraint into Word2Vec. Specifically, we take the Continuous Bag of Words (CBOW) model as an example in our study and add the ` l regularizer into its learning objective. One challenge of optimization lies in that stochastic gradient descent (SGD) cannot directly produce sparse solutions with ` 1 regularizer in online training. To solve this problem, we employ the Regularized Dual Averaging (RDA) method, an online optimization algorithm for regularized stochastic learning. In this way, the learning process is very efficient and our model can scale up to very large corpus to derive sparse word representations. The proposed model is evaluated on both expressive power and interpretability. The results show that, compared with the original CBOW model, the proposed model can obtain state-of-the-art results with better interpretability using less than 10% non-zero elements.
منابع مشابه
Cycles in Adversarial Regularized Learning
Regularized learning is a fundamental technique in online optimization, machine learning and many other fields of computer science. A natural question that arises in these settings is how regularized learning algorithms behave when faced against each other. We study a natural formulation of this problem by coupling regularized learning dynamics in zero-sum games. We show that the system’s behav...
متن کاملRegularized Distance Metric Learning: Theory and Algorithm
In this paper, we examine the generalization error of regularized distance metric learning. We show that with appropriate constraints, the generalization error of regularized distance metric learning could be independent from the dimensionality, making it suitable for handling high dimensional data. In addition, we present an efficient online learning algorithm for regularized distance metric l...
متن کاملOnline Learning with Regularized Kernel for One-class Classification
This paper presents an online learning with regularized kernel based one-class extreme learning machine (ELM) classifier and is referred as “online RK-OC-ELM”. The baseline kernel hyperplane model considers whole data in a single chunk with regularized ELM approach for offline learning in case of one-class classification (OCC). Further, the basic hyper plane model is adapted in an online fashio...
متن کاملL1 Regularized Linear Temporal Difference Learning
Several recent efforts in the field of reinforcement learning have focused attention on the importance of regularization, but the techniques for incorporating regularization into reinforcement learning algorithms, and the effects of these changes upon the convergence of these algorithms, are ongoing areas of research. In particular, little has been written about the use of regularization in onl...
متن کاملDual Averaging Method for Regularized Stochastic Learning and Online Optimization
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as l1-norm for promoting sparsity. We develop a new online algorithm, the regularized dual averaging (RDA) method, that can explicitly exploit the regularizatio...
متن کاملOnline Learning Via Regularized Frequent Directions
Online Newton step algorithms usually achieve good performance with less training samples than first order methods, but require higher space and time complexity in each iteration. In this paper, we develop a new sketching strategy called regularized frequent direction (RFD) to improve the performance of online Newton algorithms. Unlike the standard frequent direction (FD) which only maintains a...
متن کامل